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The advent of the era of Big Data has allowed many researchers to dig into various socio-technical 
systems, including social media platforms. In particular, these systems have provided them with 
certain verifiable means to look into certain aspects of human behavior. In this work, we are specif¬ 
ically interested in the behavior of individuals on social media platforms—how they handle the 
information they get, and how they share it. We look into Twitter to understand the dynamics 
behind the users’ posting activities—tweets and retweets—zooming in on topics that peaked in pop¬ 
ularity. Three mechanisms are considered: endogenous stimuli, exogenous stimuli, and a mechanism 
that dictates the decay of interest of the population in a topic. We propose a model involving two 
parameters rf and A describing the tweeting behaviour of users, which allow us to reconstruct the 
findings of Lehmann et al. (2012) on the temporal profiles of popular Twitter hashtags. With this 
model, we are able to accurately reproduce the temporal profile of user engagements on Twitter. 
Furthermore, we introduce an alternative in classifying the collective activities on the socio-technical 
system based on the model. 
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I. INTRODUCTION 


The study of information diffusion from gossip spread¬ 
ing [ISlITlj, to the propagation of viral memes [l0l[20l[2^ . 
fads, and trends [Him[53], and even word-of-mouth mar¬ 
keting jilS] has become increasingly interesting espe¬ 
cially in this era of “Big Data.” Current technologies and 
methods have allowed researchers to look more closely 
into the social network fabric—the medium at which the 
proliferation of various entities takes place. Questions 
relating to how fast information travels or what kind of 
information captures the most audience have piqued the 
interest of many researchers [3111 in [HI ng. Various ap¬ 
proaches have been implemented to shed light into these. 
Researchers have looked into the role of a network’s de¬ 
gree of connectivity, modularity, and various centrality 
measures, among other things [3 1111 m |5S]. Efforts 
have also been put in understanding the degree of social 
“influence” of entities on each other laEi [T9] . Many 
have also investigated the nature of topics that are being 
diffused in a social system. 

In this work, we propose a model that aims to cap¬ 
ture the various aspects of these approaches—we do not 
only look at the network structure in isolation, but also 
augment it with particulars on the nature of the infer- 
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mation being spread, and the individuals’ tendencies to 
spread such information or “inject” new ones. Particu¬ 
larly, we investigate the observations described in |14j on 
the dynamical classes of collective attention in Twitter 
where they defined four groups depending on the tem¬ 
poral features of their popularity dynamics. We initially 
introduce two free parameters intrinsic to the users’ be¬ 
haviours, A and t]*, where A quantifies the rate of decay 
at which a user would spread a given information and rj* 
is the threshold an agent has that determines whether or 
not he/she propagates information from the users he/she 
follows. The rules defined are then implemented in an 
empirical Twitter network obtained from the Stanford 
Large Network Dataset Collection |T8|. 

This paper is structured as follows: we first describe 
the data and model in Sec. [^ then present the results 
and discussions in Sec. Ill and finally summarise and 
establish our conclusions in Sec. IlYl 


II. DATA AND THE MODEL 

A. Data 

The dataset utilised here is a set of 115 hashtags used 
by Lehmann et al. in M- It contains the time series of 
number of tweets and distinct users for each of the hash- 
tags. Each time series centers around a day on which the 
number of relevant tweets attain their maximum “popu¬ 
larity,” and spans from seven days before to seven days 
after the day of the peak. The full data collected in [14] 
contain 130 million Twitter messages appearing in the 
period of approximately 6 months from November 20, 




2008 to May 27, 2009. We point the readers to reference 
[14] for further details on the dataset utilised here for 
model fitting and verification. [26] 


B. The model 

1. Definitions and rules 

The model is defined on a general network Af with N 
nodes, each node representing a user. Each user i has Fj 
“followers'^ and Li “leaders" whom he/she follows. This 
leader-follower relationship results to a directed network. 
It is also worth noting that although the Twitter network 
structure is dynamically changing in the real-world, here 
we only consider a static structure given the relatively 
short time frame we are considering, which is two weeks. 
Note that when a user i follows another user j, the fol¬ 
lower sees all the tweets that j posts; if, on the other 
hand, user i visits the profile page of j, i will not only 
see the tweets, but also the retweets and replies that user 
j posts. 

Three mechanisms are incorporated in our model. Two 
of which, exogenous and endogenous, define the man¬ 
ner at which information is propagated in the system 
[7| IH]. The endogenous process involves a re-posting 
of someone else’s tweet {“retweet"), thereby propagat¬ 
ing/diffusing the same tweet across the social network. 
On the other hand, when new information is “injected” 
in the social network system, an exogenous process is 
said to have taken place. In addition to these two mech¬ 
anisms, a third one is regarded as well that accounts for 
the decay of the level of the activities involving a specific 
topic on Twitter. To encapsulate, our model incorpo¬ 
rates these three processes: (1) injection of new informa¬ 
tion into the network, (2) spreading of information in the 
network, and (3) decay of information after a peak. 

The key features of the model proposed are quantified 
in two parameters r]* and A—characterising the spreading 
of information and the decay of activities in the network, 
respectively. The parameter rj* quantifies the thresh¬ 
old of influence of leaders on their followers, determin¬ 
ing whether or not a follower would take action such as 
retweeting and/or replying to a tweet, consequently ex¬ 
posing his/her own followers to the information. In other 
words, T]* encapsulate the level of contagion of a piece of 
information in the network. On the other hand, the pa¬ 
rameter A quantifies the rate of decay of interest of a user 
in the information after a certain point in time. It could 
be seen that, in our model, the build-up in activities be¬ 
fore a topic’s peak in popularity is solely reflected by the 
parameter rj*, while the decay in activities after the peak 
is the interplay between the two parameters rf and A. 

To make the model results comparable with the data 
we have at hand, we use the scale of one day as one time 
unit. The rules and flowchart of implementation of the 
model are described in Fig. The model is updated se¬ 
quentially, i. e. the state of a user i at time t only depends 


2 

on the state of the network before time t but not at time 
t. 


2. Assumptions 


The model constructed makes the following assump¬ 
tions on the tendency of a user to tweet and retweet. 
A user posts an original [27] tweet if he/she is exposed 
to some new information outside of his/her Twitter net¬ 
work, i.e. from external sources (or has some original 
ideas to share). A user who follows a lot of other users 
tends to rely solely on his/her social network for infor¬ 
mation and, hence, retweets more often than “injects” 
new information from external sources. On the contrary, 
a user who has a huge following tends to be more active 
in posting original ideas or new tweets rather than just 
reposting others’. These assumptions on tendencies are 
illustrated in Fig. 

Let us consider a user i {i = 1,2,...,7V) who follows Li 
leaders l{i,j) {j = 1,2Li) and who has Fi followers. 
The probability that user i is exposed to external sources 
is 


Pi{t) =Aix{t-to), 


( 1 ) 


in which Ai represents the activeness of i in following 
news and propagating to other people, and y(f — tg) the 
coverage by the media. In general, the temporal profile of 
external media coverage satisfies the limiting conditions 


1 > x{^) > 0 Va; 


X(0) = 1 

lim x(^) = 0 

|fc|—>-oo 


( 2 ) 


We, however, assume that within a narrow window of 
time around the event, the media coverage is consistent 
and stays approximately constant so that x(a: ~ 0) « 1. 
By the assumption described above, the activeness Ai 
takes the form 



max 



(3) 


to reflect the assumption that a user having more fol¬ 
lowers tends to be active in following news and can in¬ 
troduce interesting stuff, but that is offset by having 
many leaders—as in such case, the user tends to rely on 
the leaders for information rather than tweeting so him¬ 
self/herself as illustrated in Fig. |2(a)| (see, for example, 

my 

Upon external exposure, the probability of a user i to 
tweet Ti depends on: (I) the interest of user ai in the 
nature of the information or the particular topic under 
consideration, (2) the level of interest Ti{t — to) as a func¬ 
tion of time, and (3) his hesitancy to tweet Hi. 


= aiTi{t - to) - Hi. 


(4) 








3 


for every time step: 
for each user of the network: 



FIG. 1: Rules of the model proposed in this work, to is the day of the peak and rj is the amount of activities by the user’s 
leaders accumulated after his last tweet. 


The level of interest Ti{t — to) is high during and before 
the event, and decays with rate A after the event 


t{x) 


1 if a: < 0 
exp {—\x) if x > 0 


(5) 


The hesitancy to tweet (also retweet) depends on the 
number of le aders and followers a user has as illustrated 
in Fig. |2(b)| The less leaders or followers a user has, 
the more hesitant he is to retweet because of the lack of 
engagement and/or motivation to do so. Hence, 


H,= 


1 

Li -\- Fi \ 


( 6 ) 


Here, we also assume that a = 1 indicating that we only 
focus on the topics that are of interest to the users. 

Next, we define the average influence of all leaders of 
a user i as 


h 


* 


(7) 


in which F^i j-j is the number of followers that the leader 
Z(i,j) has. 

In addition, we quantify the amount of exposure user 
i has to the influence of his/her leaders in the following 
equation: 


l/(t)= ^ F,(8) 

all leaders 
l{i,j) having 
tweeted 
recently 
before t 


And the necessary condition for retweeting is 



(9) 


Upon this condition is met, the user i retweets with prob¬ 
ability 


Ri{t - to) = cnnit - to) - H„ 


( 10 ) 


which takes the same form as Eq. Q in which Hi repre¬ 
sents the hesitancy as described in Eq. Q. 

The number of leaders who tweeted recently, i. e. after 
the user’s last tweet and before current time t, is denoted 
as r]i{t). The total number of possible retweets by user i 
at time t is given by 


Mt) 




( 11 ) 


in which we only take the integer part and take 0 as 1 
because the number of retweets is at least 1 if the user 
retweets. 

If the user retweets, it does not necessarily mean that 
he would retweet all n tweets. The probability to retweet 
R means that he tweets at least one tweet. Therefore, it 
could be calculated that each of his n possible retweets 
carries probability r = 1 — \/l — R. 

By identifying the two key parameters A and 77 *, we can 
expect to observe four different types of users’ behaviour 
in response to an event, as illustrated in Fig. The four 
types correspond to four quadrants in the {\,rf) param¬ 
eter space, namely lowly contagious-slow decaying, lowly 
contagious-fast decaying, highly contagious-slow decay¬ 
ing and highly contagious-fast decaying. 
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(a)Tweeting behaviour of different types of Twitter users 
based on their number of leaders and followers. Each type 
corresponds to the likelihood of being exposed to external 
media. 


FIG. 3: Distribution of different types of event in the (A, 77 *) 
parameter space. 
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(b)Retweeting hesitancy of different types of Twitter users 
based on their number of leaders and followers. The arrows 
indicate the directions of increasing hesitancy, i.e. when the 
number of leaders or followers decreases. 


eter of 7, and a clustering coefficient of 0.5653. We run 
the simulation starting from St days before a topic peaks 
in popularity to (we also refer to this one as “event”) 
until 7 days after tg- can. vary from 0 to 7, mimick¬ 
ing the fact that the amount of activities related to an 
event becomes significant up to St days before the event. 
St = 0 corresponds to sudden events while a large value 
of St indicates an anticipated one. It is noteworthy that 
by varying St, we effectively include a third parameter in 
our model, which characterises the injection of informa¬ 
tion into the network. 

We then scan the {X,r]*) parameter space in the steps 
of AA = 0.1 (A € [0;4]) and Ar]* = 1 (ry* g [1;60]) to 
produce different time series for the number of tweets as 
well as the number of (distinct) users everyday and iden¬ 
tify the ones that reproduce the empirical observations 
by using the distance metric introduced below. Since 
this is a Monte-Carlo simulation that involve generation 
of random numbers, we perform 50 runs with distinct 
seeds for the random number generator for each set-up, 
i.e. the triplet {St,r]*,X), and take the average results. 


FIG. 2: Behaviour patterns of different types of users accord¬ 
ing to their number followers and leaders. 

A. Validation of the model 


III. RESULTS AND DISCUSSIONS 

The empirical network we use for simulation was ob¬ 
tained from Stanford Large Network Dataset Collection 
[IS] . The entire network is a combination of 1,000 ego 
networks with 81, 306 nodes and 1, 768,149 links, a diam- 


We compare the data generated by our model to 
the empirical data by calculating the matching score 
of the two profiles which are quantified by the frac¬ 
tion of users or tweets on a single day. In details, let 
P = {Pi, P2, ■.., Pn) be the profile of the tweets pro¬ 
duced by our model, i.e. Pi is the fraction of tweets on 
day ti within the entire period from ti to Iat. By defini- 
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tion, we have 


i=N 




( 12 ) 


i=l 


Similarly, Q = (Qi, <32, ■ • •, Qat) is the corresponding 
profile of the tweets in the data collected by [14]. 

We compare P and Q by introducing the metric 


S(P.Q) = ^ 


\ 


i^N 

V ( 

^ P^-Q^ \ 

2-^ \ 

2=1 

ymax{Pi,Qi) J 


(13) 


which quantifies the (normalised) “distance” between the 
two profiles. It is obvious that when the two profiles are 
identical P = Q, i.e. Pi = Qi \/i = l,2,...,iV, the 
distance is 5{P, Q) — 0. This is a normalised measure so 
that the maximum possible value of <5 is 1 . 


In Eq. (13), when Pi = Qi = 0, the term 

p. — Q^ \ 2 

does not have any contribution to 6. 


max (Pi, Qi) 

Finally, we set a tolerance threshold 0 = 0.04 such that 
all the terms with Pi + Qi < 0 do not have any contribu¬ 
tion to 6. 

Using the metric introduced above and after visually 
verifying the plots (Fig. [^, we consider measures with 
6{P,Q) < 0.08 good and discard the rest. Of the 115 
hashtags, about 80% (88/115) result to good fits—both 
for the number of users and number of retweets. The re¬ 
maining 20 % fall into the groups of activities distributed 
before and symmetric around the peak day [m, which 
have significant amounts of activities distributed prior to 
the events. This demonstrates that the proposed model, 
in spite of it being capable of capturing the main fea¬ 
tures in the collective attention build-up and decay of 
users before and after the event day, requires additional 
framework that would quantify the “sense of time” of the 
users—whether or not an event is approaching [T]. This 
aspect will be investigated and reported elsewhere. 

It is worth noting that while it is not straightforward 
to know how many times a user would tweet or retweet in 
a day, we have shown that our assumptions in Sec. |IIB 2| 
for the users’ activities work well in estimating both the 
number of users and retweets in most cases. Moreover, 
the fact that we could reproduce the temporal profiles of 
activities (see Fig.[^ using our model with only two user- 
intrinsic parameters and an effective third parameter for 
external factors, justifies and validates our assumptions 
and hypotheses in identifying the key mechanisms of in¬ 
formation spreading in social networks. 


B. Classiflcation of hashtag types 

With the estimated parameter values, we generate the 
plot for the distribution of the hashtags on the two- 
dimensional parameter space of ry* and A, as shown in 
Fig .0 From the plot, we can observe the clustering 


pattern corresponding to different types of event shown 
in Fig. with only a few outliers. It is quite evident 
that there is a clustering of large points at the bottom 
left corner of the plot, which correspond to the events 
that quickly go viral and last long. Those events ap¬ 
pear many days before the peak and generate significant 
amount of activities afterward. The other three clusters 
contain small points signifying the events start not so 
long before their peak of activities. 

As illustrated by the colors of the data points in Fig.[^ 
we can also observe that the distribution of the points 
correspond very well to the classification of dynamical 
classes reported in [T3], i.e. the points for each of the 
four classes can be segregated into distinct clusters (with 
exception of a few points in class of activities concen¬ 
trating before the peak, see below). The four classes are 
called A, B, P and S, respectively, in this work for conve¬ 
nience of the discussion. Class A describes events where 
the associated activities are concentrated after a topic 
peaks in popularity. Class B, on the other hand, refers 
to the events where the activities occur before the peaks. 
Class P consists of events where the activities are concen¬ 
trated on a single day. Finally, Class S contains events 
that have significant activites before, on and after the 
peak day. Our results show that the clusters described 
above also reveal the existence of subclasses within each 
of the classes. In Fig. we can generally identify 7 clus¬ 
ters of data points (or hashtags) which show very good 
correspondence to the classification in El- 

From the fittings, we can observe two subgroups in 
the class with activities concentrating after the peak, 
i.e. class A (after). One group shows long range be¬ 
haviours in which the activities span over a long period of 
time reflected by slow decay of interest (small A) but high 
spreading threshold (large 77*). The other group shows 
short range behaviours in which the activities span over 
a very short period of time reflected by low spreading 
threshold (small 77*) but very fast decay of interest (large 
A). 

For the class with activities concentrating before the 
peak, i.e. class B (before), we also observe two subgroups. 
One group shows long range behaviours in which the ac¬ 
tivities span over a long period of time reflected by long 
appearance before the peak but high spreading threshold 
(large 77 *). The other group shows short range behaviours 
in which the activities span over a very short period of 
time reflected by very short appearance before the peak 
but very low spreading threshold (small 77 *). 

For the class with activities concentrating at the peak, 
i.e. class P (peak), the values of the parameters sug¬ 
gest two subgroups, both of which have very fast de¬ 
cay of interest (large A). One group shows contagious 
behaviours in which the events appear very shortly be¬ 
fore the peak but generate a lot of activities due to low 
spreading threshold (small 77 *). The other group shows 
inert behaviours due to very high spreading threshold 
(large 77 *). 

The class with activities distributed symmetrically 
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FIG. 4: Time series of activities (top) and users (bottom). Results from the model (blue) shown together with the data (red) 
presented in [14] for classes A, B, P, and S, respectively. 


around the peak, i.e. class S (symmetric), generally has 
low spreading threshold (small 77 *) and slow decay of in¬ 
terest (small A). 

In Fig. we show the different profiles for each of the 
classes described above. 


C. Content analysis 

After revealing the existence of the classes and sub¬ 
classes of the hashtags, we turn to looking at content of 
each hashtag and learn how it is related to the apparent 
classification. In Appendix we have a table showing 


the hashtags together with their corresponding type and 
class (and subclass, according to our results above). The 
table is organised in such a way that the top rows contain 
the “simple” hashtag types, in the sense that the hash- 
tags of those types generally belong to one class identified 
by our model. The rows further down at the bottom of 
the table contain more complicated hashtag types whose 
tweets fall into different classes. 

From the table, it could be seen that hashtags in 
the categories of activism (#ie6, #pmaii) or technology 
(#safari, #safari4, #skype) indicate events that cap¬ 
ture attention in a long period of time and make im¬ 
pact that keep people discussing. These events are called 
for attention on a particular matter, e.g. campaign or 
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FIG. 5: Fitted parameters rj* and A showing clustering patterns. The circles of larger size correspond to large value of St. The 
colours (online) of the data points are determined by the classed identified in [T3], red for S, black for P, blue for A and green 
for B. 


of great interest and impact to many people, e.g. tech¬ 
nology products. The peak in these events are usually 
associated with a symbolised or iconic activities on that 
day, e.g. rally of people in a place or release of a product. 
The hashtags in the category of charity (#twestival, 
#protest) indicate events that generate activities before 
a peak but soon decay after that. This is because these 
events usually call for people’s support to achieve a cer¬ 
tain goal {e.g. fund raising, signature collection). And 
once the goal has been achieved, people are no longer in¬ 
terested in the follow-up. The hashtags in the category 
of marketing generally exhibit sudden appearance. That 
could be explained by the strategies of marketers releas¬ 
ing incentives to advertise their products. But our results 
show that it also depends on the type of product and how 
it is advertised to determine the dynamical behaviours of 
people’s attention to it. 

The hashtags in other categories generally spread 
across different classes with no easy way of relating the 
content to the class. Nevertheless, content type like the 
Twitter (word) games spontaneously started by some 
user(s), which appear in all of the classes and subclasses 
identified in the work, could provide a very useful set-up 
to study what type of content would become popular in 
a social setting 01 El]. Further analysis of the meaning 
of the hashtags and the content of the tweet messages 
containing the hashtags will be explored and reported 
elsewhere. 


D. Discussions 

The classification of hashtags allows us to identify their 
general features in terms of how people react to the in¬ 
formation they receive and also possibly infer their con¬ 
tent. Overall, class S (symmetric) occupies the bottom 
left quadrant of the parameter space {X,r]*). In this 
quadrant, the threshold ry* is low and the rate of de¬ 
cay A is also low. They correspond to events that can 
easily spread (due to low threshold) and can last after a 
topic peaks in popularity (low rate of decay), e.g. movie 
(#watchmen), technology release (#safari, #skype) or 
activism (#pmcin). Our model in this study can recon¬ 
struct the data very well up to 5 = 4 days before the peak 
but generally falls through beyond that. This suggests 
a different pattern in people’s behaviour when spread¬ 
ing the information when the “sense of time” is relevant, 
i.e. before and near the event associated with the infor¬ 
mation. 

On the other hand, class P (peak) occupies the right 
half of the parameter space, which corresponds to events 
that decay very quickly after the peak. They can fur¬ 
ther be categorised into two groups: the upper one (high 
threshold rj*) corresponds to events that capture imme¬ 
diate attention but decay immediately, e.g. unexpected 
and unpopular political events (#spectrial, #nsotu) or 
occasional media events (Sgrcimmys, #oscars); and the 
lower one (low threshold t]*) corresponds to the events 








that spread very quickly (it appears one or two days be¬ 
fore the peak) and also decay very quickly, e.g. sport 
events (#nfl, #superbowl). The remaining two classes 
A (after) and B (before) can both be divided into two 
groups: (1) low threshold, high decay rate; (2) high 
threshold, low decay rate. The difference between them is 
the time the users become aware of the events. Events in 
class A are sudden and people continue to discuss them 
due to either low decay rate (long last), e.g. lobbying 
marketing campaign (#macheist), or low threshold (easy 
to spread), e.g. honouring popular stars (#hoppusday). 
Events in class B depict anticipation where people al¬ 
ready discuss the topics even before their popularities 
peak—this contributes to large amounts of activities be¬ 
fore the peak, e.g. new feature of Twitter (#plurk) or an¬ 
ticipated show (#poynterday). The events in this class, 
however, display scattered pattern and in some rare cases 
make overlap with class S (#therescue). 

It needs to be emphasised that the model proposed is 
straightforward and concise—carrying the heuristic and 
intuitive assumptions on the online behaviours of users, 
given the knowledge of their social network’s structure. 
Yet, the model produces the dynamical behaviours ob¬ 
served in real data and allows us to gain insights on the 
clustering of topics—telling us about the different natures 
of the contents being circulated in the social media, and 
how these clusters relate to the classes presented in El- 
This signifies that the three mechanisms included in the 
model are essential and sufficient in accurately describing 
the dynamics behind the collection attention of users on 
a Twitter network. 

Knowing the relevant factors that influence the dy¬ 
namics behind information spreading and trend setting 
is crucial for various aspects of society which can range 
from governance to politics, and marketing. Everyday, 
we are overwhelmed with terabytes of information origi¬ 
nating from various social media sources as people share 
news, comments, opinions, and updates in their blogs, 
microblogs, and homepages; and on Facebook, Twitter, 


and Instagram, among others. The key for the stake¬ 
holders is to know how to manipulate and strategize, if 
possible, their messages and campaigns such that theirs 
will stand out to attract attention and not get lost in the 
vast sea of online information. 

What we have presented herewith so far is a model that 
recaptures the previous trends for certain issues and top¬ 
ics by describing certain attributes of the agents involved 
in the social network. The next important question is 
whether or not we can use this knowledge to reshape 
the trend profiles of the different information types. Our 
work hints on the importance of knowing the kind of au¬ 
dience on which a product, an idea, or a campaign has 
possible influence. That aspect to some extent is quanti¬ 
fied in our model as the parameters A and ry*. 

IV. CONCLUSIONS 

In this work, we proposed a model using three mech¬ 
anisms that underlie the tweeting and retweeting be¬ 
haviours of users on Twitter. These behaviours corre¬ 
spond to perceiving and propagating information in a 
social network. Despite the simplicity of the model, we 
are able to capture the general patterns of behaviours 
observed in real data. In particular, we have not only il¬ 
lustrated the four dynamical classes reported by Lehman 
et al. m but also demonstrated the existence of further 
subclasses in three of the classes. 
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Appendix A: Hashtag type vs. its class 


The 88 hashtags used in this study. They belong to 13 types of event. Full description of the meaning of the 
hashtags could be found in mi. 
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Class — 

A 

B 

P 

S 

Hashtag 
type i 

High rj* 

Low rj* 

High rf 

Low ry* 

High ry* 

Low rj* 

Low ry* 

Activism 

(2) 







#ie6 

#pmEtri 

Technology 

(3) 







#safari 

#safari4 

#skype 

Charity (2) 




#twestival 

#protest 




Sport (6) 




#masters 


#superbowl 

#nfl 

#superadsO 

#nfIdraft 

#superbowl 

ads 

9 

Honour (3) 


#hoppusday 


#poynterda 

#asot400 

Y 



Holiday (3) 


#aprilfool 

s 

#easter 




#happy09 

Convention 

(10) 


#rp09 

#mix09 

#leweb 

#macworld 




#w2e 

#ces 

#ces09 

#drupalcon 

#cebit 

#25c3 

Awareness 

(4) 


#earthday 


#earthhour 

#therescue 


#horadopla 

neta 

Marketing 

(5) 

#glmagic 

#f ree 

#macheist 

#skittles 



#evernote 



Media (9) 

#bsg 

#bachelor 

ttamericani 

#starwarsd 

#phish 

dol 

ay 


ttgrammys 

#oscars 

#oscar 


#watclimen 

Political 

(10) 


#g20 


#rncchair 

#teaparty 

ttspectrial 

#nsotu 

#budget 

#inaug09 

#davos 

#coalition 

#hadopi 

Disruption 

(14) 

#amazonfa 

il 

#peace 

#swineflu 

#bushfires 

#googmayh 

arm 

ttwinneden 



#gfail 

#gmail 

#schiphol 

#blackout 

#snowmage 

ddon 

#mikeyy 

#hlnl 

#influenza 

Twitter 

(17) 

#yourtag 

#blogger 

#socialmed 

#unfollow 
friday 

ia 

#tweepme 

#firstfol 

low 

#plurk 

#iloveyou 

#myfirstj 
ob 

#nerdpick 

up 

#oscarwil 

deday 

#3hotwords 

#oneword 

#crapname 

s 

#followme 

#dbi 

#politics 




